Romanian Zero Pronoun Distribution: A Comparative Study
نویسندگان
چکیده
Anaphora resolution is still a challenging research field in natural language processing, lacking an algorithm that correctly resolves anaphoric pronouns. Anaphoric zero pronouns pose an even greater challenge, since this category is not lexically realised. Thus, their resolution is conditioned by their prior identification stage. This paper reports on the distribution of zero pronouns in Romanian in various genres: encyclopaedic, legal, literary, and news-wire texts. For this purpose, the RoZP corpus has been created, containing almost 50000 tokens and 800 zero pronouns which are manually annotated. The distribution patterns are compared across genres, and exceptional cases are presented in order to facilitate the methodological process of developing a future zero pronoun identification and resolution algorithm. The evaluation results emphasise that zero pronouns appear frequently in Romanian, and their distribution depends largely on the genre. Additionally, possible features are revealed for their identification, and a search scope for the antecedent has been determined, increasing the chances of correct resolution.
منابع مشابه
To Be or Not to Be a Zero Pronoun: a Machine Learning Approach for Romanian
This paper presents a new study on the distribution and identification of zero pronouns in Romanian. A Romanian corpus that includes legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments have been performed on the created corpus for the identification of verbs ...
متن کاملA Comparative Study of Spanish Zero Pronoun Distribution
The aim of this paper is to report the distribution of Spanish zero pronouns in three different genres: legal, encyclopaedic and instructional. The Z-corpora were created for this purpose and a sample of 1043 zero pronouns was annotated. The most salient patterns of distribution are compared for each genre, and some relevant issues concerning the use of zero pronouns are described in relation t...
متن کاملThe Impact of Zero Pronominal Anaphora on Translational Language: a Study on Romanian Newspapers
This study investigates the impact of zero pronominal anaphora for Romanian on a learning model able to distinguish between translated and non-translated texts. Even though the correct understanding of ellipsis from the source language and its mapping into the target language is essential in the translation process, zero pronominal anaphora has been scarcely investigated in the context of trans...
متن کاملResolving Romanian Zero Pronouns: A Machine Learning Approach
This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...
متن کاملZero Pronominal Anaphora Resolution for the Romanian Language
This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...
متن کامل